-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark GPT-tfjs #659
Benchmark GPT-tfjs #659
Conversation
thanks, very interesting! |
@tharvik I'd be curious to hear your opinion on a few things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superbe! that's very nice to have metrics on what's we're doing, thanks!
- Where do you think we should report the benchmark? I was thinking of all reporting them in this PR and linking it where relevant (e.g. in gpt/config.ts or the GPT class docstring)
the nicest thing would be to be able to generate such metrics via the CLI, with the example output of the cmd being what you've here (not the tables I mean, but the same content).
- Benchmarking performance needs modifying the gpt source code to keep track of memory, do you think it's worth keeping around or leave the benchmark on this branch and not merge it?
not merging it means that it'll slowly drift off. I think adding the memory usage to the EpochLogs
as you did is the way to go. see my comments related to it.
The memory values needs to be slightly updated when #807 is merged |
up or down? ;) |
A very superficial benchmark showed 10-20% decrease in memory usage! |
Training
Benchmark on a 2022 MacBook Air M2 with 16GB of RAM.
To reproduce, check out 58f018f and run
npm -w cli run benchmark_gpt -- --contextLength 128 --batchSize 8
for example.Time per token is obtained by measuring the time of 10 training update iterations and diving by (batch size * context length)
Memory values are the max memory allocated between the attention mechanism and the memory after computing the gradients. So far, the attention mechanism always had higher memory requirements. The actual peak memory allocated during training may be different but tfjs doesn't let us get this information easily.
I leave empty
-
cells where I deemed the benchmark too slow to perform. If needed, missing values can be extrapolated.gpt-nano
:gpt-nano
batch_size=8
batch_size=16
batch_size=32
batch_size=64
context_length=128
0.33 GB
0.56 GB
1.12 GB
2.18 GB
context_length=256
0.64 GB
1.22 GB
2.36 GB
4.66 GB
context_length=512
1.42 GB
2.75 GB
5.42 GB
context_length=1024
3.56 GB
6.98 GB
context_length=2048
10.2 GB
gpt-micro
:gpt-micro
batch_size=8
batch_size=16
batch_size=32
context_length=128
0.6 GB
1 GB
1.86 GB
context_length=256
1.1 GB
2 GB
3.8 GB
context_length=512
2.3 GB
4.4 GB
context_length=1024
5.8 GB
gpt-mini
:gpt-mini
batch_size=8
batch_size=16
context_length=128
1 GB
1.75 GB
context_length=256
1.9 GB
3.5 GB
gpt2
:gpt2
batch_size=8
context_length=128
7.7 GB
context_length=256
12.7 GB
Comparisons
Using the Python nanoGPT benchmark script on the same machine, I get the following comparisons between Python and JS:
gpt-nano
gpt-tfjs
python
(nanoGPT repo)batch size=8
andcontext_length=128
batch size=32
andcontext_length=512
Inference
Run
npm -w cli run benchmark_gpt -- --inference --modelPath <path to trained model>
For
gpt-nano
trained with context length 128, inference time averages between 6 and 8 ms/token.WebGPT reports 3 ms/token at 5M parameters, which is between gpt-nano (2.5M) and gpt-micro (7.2M). They also managed to scale up to 1.5B parameters on a M1 Mac with WebGPU.